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ABSTRACT 





To initiate the student in the concept of Linear Regression, and in that of Associated Correlation, in an introductory course of Descriptive Statistics the use of historical 
problems related to the subject is proposed, in particular, of the problems addressed by Galton at the end of the century XIX, associated with genetic inheritance. In 
addition to transferring the student to the historical context in which they emerged, giving them names of pioneer scientists, we proceed with the resolution of these 
problems intuitively, visualizing them graphically, testing with different measures of descriptive statistics already known to them, with trial and error, with the support 
of calculator and spreadsheet. For the end we leave the formalization of the theory. The experience proposes, then, a reverse course to what is usually habitual, from 
practice to theory. First the student faces a set of problems in which the base is the relationship between variables, unknown by them until the moment of facing them, 
and using the statistical instruments related to the management of a variable, such as the mean and the standard deviation. Our role as a teacher is to introduce clues that 
help you overcome obstacles on a practical level. Then those tracks, formalized, will become the theoretical basis of the chapter devoted to regression analysis. We 
proceed to the evaluation of the experience. The results of the same, valued in three ways, resolution of practical exercises, survey of students and exam grades, show 


that the process has been positive for the ultimate goal of learning. 


KEYWORDS: Regression, correlation, Galton, Pearson, spreadsheet. 


1. INTRODUCTION: 

The regression analysis is one of the fundamental pillars in the programs of those 
subjects that initiate the student in the learning of statistical techniques. It is well 
known, on the other hand, that these techniques become valuable instruments for 
the professional development of graduates in Business Administration and Man- 
agement. The association of the cause-effect type, the relationships between vari- 
ables, the intensity levels of these relationships, the predictions of behaviors 
based on a few clues, all constitute a set of ideas that, at a statistical level, is part 
of the broad concept of "regression". In the initiation courses, the concept is 
introduced in a descriptive way, handling observed data. Then, once the student 
starts in the calculation of probabilities and, therefore, in the theoretical probabil- 
istic models, again the regression is incorporated into the programs of the sub- 
jects, but now between random variables. In this case, of course, the concepts 
already acquired at a descriptive level are fundamental. Finally, in the discipline 
known as Econometrics, which is part of the curriculum of the aforementioned 
degree in higher education, regression is the basis on which most of the econo- 
metric analysis techniques are based. We are aware, therefore, of the importance 
of this chapter and we are concerned that the absorption of knowledge and ideas 
is complete and, in addition, that it is attractive. Interesting in this aspect the work 
of Stanton (2001), which presents a teaching approach from historical data, and 
that has served as a guide for the development of this experience. 


Traditionally, this chapter has been taught following this direction: first, the lin- 
ear correlation coefficient is introduced, that is, the concept of linear association 
between variables (with its associated formula) and, then, the regression line 
with the theoretical calculation of coefficients and measures of goodness of fit. 
All this accompanied by important and, sometimes, tedious algebraic develop- 
ments (although necessary), for the justification of the different formulas. The 
connection between correlation and regression is obscured by occurring more at 
the algebraic level than at the level of ideas. The theme is completed with a series 
of practical exercises and problems, first developed by the teacher, and then pro- 
posed, where the student has to show the knowledge and skills acquired during 
the previous explanations, both theoretical and practical, of the teacher. The 
entire learning process takes a vertical direction, from top to bottom, from the 
teacher to the student, with little reflection and maturation on the new and impor- 
tant ideas. The vast majority of published manuals on the subject follow that 
path. Our experience as professors (up to 20 years of teaching) shows us a some- 
what somber and unsatisfactory scenario, because this topic is a bit dislocated in 
that immense ocean of statistics. 


2. MATERIALS AND METHODS: 

The described thing has made us pose, for some time, the search of educational 
alternatives. One of them would be (and is the one we have developed here) the 
same one that served Galton to introduce himself in these concepts at the end of 
the 19th century. Galton, a cousin of Darwin, and a recognized scientist of that 
century in his own right, has often been criticized for his commitment to eugen- 
ics. On the other hand, there are those who believe that the lasting fame of his 
cousin has unjustly overshadowed the important scientific contributions with 
which Galton contributed to the field of biology, psychology and applied statis- 


tics. His passion for genetics and, in particular, for inheritance problems, is what 
led him to think about calculation methods such as regression and correlation. 
Thus, the reflections that lead him to this field begin with a complicated (then) 
inheritance problem: the understanding of the force with which the characteristic 
of one generation of living beings manifests itself in the next. Initially, Galton 
approaches this problem by examining characteristics of pea seeds. Choose the 
pea because this species can self-fertilize: the daughter plants show genetic vari- 
ations of the mother plants without the contribution of a second parent. In this 
way, Galton postpones the problem of statistically calculating genetic contribu- 
tions from various sources. Galton's first idea about regression comes from a 
graph, a two-dimensional diagram, in which the sizes of the pea children were 
represented in front of those of the pea parents. Galton realized that the median 
diameter of the seed seeds for a particular diameter of the parent seed describes, 
approximately, a straight line with a positive slope and less than 1. This author 
uses the representation of his data to illustrate the basic foundations of the that 
statisticians continue to call regression. From here, with the errors of any incip1- 
ent research process, Galton begins to build a whole theory that, mathematically, 
was later formalized by one of his disciples, Pearson. 


So, the teaching objective of the experience developed has been a mixture of 
"problem-based learning" and "historical birth and development". Pragmatism 
and history. Mathematical modeling has been a posteriori. We invert the order, 
from practice to theory, from students to teacher: problems motivate, students 
think and propose solutions, and the teacher supervises and guides. With this we 
try to improve the understanding of the fundamentals and encourage the interest 
of the student by showing the various problems with which Galton, and other 
early researchers confronted and solved when they initiated the techniques that 
are so widely used today. 


The experience has been developed in a group of about 80 students of the Degree 
in Business Administration and Management, in the subject Statistics I which 
constitutes an introductory course to statistics, and in which most of its content is 
related to techniques and Descriptive Statistics methods. 


The path followed, then, has gone to the following address: 


1. First, some biographical aspects of Sir Francis Galton are reviewed. Stu- 
dents are referred to the website recognized as official, about the life and 
work of this author: http://galton.org/. This circumstance allows a historical 
recount of the aspects of scientific and mathematical progress in the late nine- 
teenth and early twentieth centuries. In this context, the scientific back- 
ground with which Galton confronts the problem has been described and it 
has been explained how its mathematical deficiencies initially limited the 
analytical development of regression. Students also access the most press- 
ing problems for science at the end of the 19th century, one of them being the 
genetic inheritance. 


2. Employment in the classroom of some of the historical examples, with the 
same data as these pioneers. The first set of data that was offered to the stu- 
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dents is that which Galton worked on in his famous work Natural Inheritance 
(1894). In his four biographical volumes, Pearson describes the genesis of 
the discovery of the slope of regression (Pearson 1930). In 1875 Galton dis- 
tributed packages of pea seeds among seven friends; each of them received 
seeds of uniform diameter (see also Galton 1894), but there were substantial 
differences between the different packages. Galton's friends collected the 
seeds of the new generation and returned it to him. The measurements of the 
diameters of this second generation crossed with those of the parents are col- 
lected by the author in the following table, this being the first one that 1s pro- 
vided to the students: 


NATURAL INHERITANCE. 


Tanie 2. 
Parent Sxeeps AND THEIR Propuce. 


The proportionate number of sweet peas of different sizes, produced by parent 
seeds also of different sizes, are given below. The measurements are those of 
their mean diameters, in hundredths of an inch. 


Mean Diameter 


Diameters of Filial Sceds. of Filial Seeds. 


Total. | 
16- | 18-| 19-| 20- — Observed Smoothed 


— -———- 


17'5 173 
17°3 17°0 
16°0 16 6 
16°3 163 
156 | 160 
16°0 157 
15°3 154 


Figure 1: Image of the Table published by Galton on the diameter of 
parent pea seeds versus children pea seeds. 





With the data from that table, we propose the construction of an Excel-type data- 
base, in which three columns are specified: the two variables to be related, diam- 
eter of the parent seed, those of the seed children, and as the third column the abso- 
lute frequency, that is, the number of times that a particular couple repeats. 


Since the data corresponding to the child seeds are presented by Galton grouped 
in intervals, a "class mark" is selected for each one. An example of the construc- 
tion of this database is the following table: 


Table 1: Data in excel of the Table published by Galton on the diameter of 
seeds of pea parents versus seeds of pea children. 




















































































































diameter 
father | 

21 «| «45 | S22 iS ae 17 

Z| VES We Pe ese Mies: 

2 | 15 | 1 || is | 195 | 

21 «| «6175 «| (18 | 18 20,5 

my 5 | 21 | | 17 Es 37 

27 | +195 | 13 «|| Te lias nie 

21 20,5 6 i 15,5 13 

21 22,5 2 17 16,5 16 

20 14,5 DS 17 17,5 13 

20 15,5 10 17 18,5 4 

20 16,5 | 12 17 19,5 

20 ed 16 14,5 34 

20 18,5 20 16 15,5 is | 

20 195 13 16 16,5 18 

20 20,5 3 | 6 | 175 | 16 

20 205: 16 18,5 13 

19 14,5 35 6 @6©| 1995 | 3 

19 15,5 16 16 20,5 

19 165 | 12 15 13,5 460 

19 17,5 ig 15 14,5 14. si 

19 18,5 ll 15 15,5 9 

19 19,5 10 15 16,5 ll 

19 20,5 p 15 | 17,5 14 

19 2,5 | 1 | is | 15 | 4 
«18 14,5 34 Is 19,5 2 
1855 12 _ 

18 165 | 13 
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In a graph, Galton presented the diameters of the parental peas versus those of the 
children. As has been said, discover how the median diameters of the seed seeds 
for each specific diameter of the parent seed describes, approximately, a straight 
line with a positive slope and less than 1. Thus, naturally, it found a first regres- 
sion line and also, a constant variability for all the series of a second character, for 
a given character of the first. Perhaps, the study of this simple special case was 
the best for the progress of correlational calculation, given the ease of under- 
standing by a beginner. Therefore, following this process of conceptualization, 
the use of graphic methods is proposed first. We show the students the graphic 
representation of the dispersion diagram associated with this data. Excel allows 
this representation. In the classroom we have a computer for the teacher, with 
screen projection and, in turn, students come to class with a laptop mostly. Now, 
in order to facilitate the theoretical introduction of the concept, we recommend 
that you proceed to the standardization of both variables, leaving the two cen- 
tered on the origin of coordinates (the two means are transformed into (0,0)) and 
both with equal variability (their standard deviations are made equal to 1), so that 
the differences in their magnitudes or scales do not interfere (masking or fatten- 
ing) in the analysis of the possible relationship between both. Although we warn 
students that, in the case at hand, the two variables are very similar in terms of cen- 
trality and variability so, perhaps, their standardization would not have been nec- 
essary. With the typified variables we represent and, thus, visualize the relation- 
ship, inviting students to search for simple functions (straight lines) that are capa- 
ble of reflecting as best as possible what the scatter diagram transmits. We 
accompany each point of the diagram of a numerical value that coincides with the 
corresponding absolute frequency. In some way, this value informs us of the 
weight that the associated point must have in the analysis of that relationship. 


ho 


2] 


Diametro guisante hijo tipificado 


ie te ae 


-1,50000 -0,50000 0,00000 0,50000 1,00000 


Diametro guisante padre tipificado 


Figure 2: Dispersion diagram of the Galton data typified together with 
the absolute frequencies of the corresponding pairs. 


The approximate line of a line interspersed between the points that try to approxi- 
mate as muchas possible to all of them, passing through the origin of coordinates 
since this is the "center" of the diagram and that takes into account the "weight" 
of each point in the graphic is what Galton tried as a first approximation and is 
what is presented before the vision of the data in the plane. Once each student has 
drawn his own line he is invited to calculate the slope of the same, easy calcula- 
tion on the other hand as it is a quotient between opposite and contiguous dis- 
tance. In this context, the student realizes two details: the straight line is increas- 
ing (the points located in the third quadrant have a lot of weight), but with a gentle 
slope (of course, much smaller than 1 that would correspond to the bisector of the 
lst and 3rd quadrants). The first one informs us of a direct relation between both 
variables: in general, small diameters of the father correspond to small ones of 
the son and, on the other end, to large diameters of the parent, large diameters of 
the son seed. The second one makes us think that large variations in the parent 
diameters translate into smaller variations in the second generation, that is, val- 
ues closer to the center in this second (what Galton called "regression to the 
mean", which gave rise to the term that nominates all this theory). 


A second graphic proposal is proposed. The students already know the construc- 
tion of the box diagrams. In this second graph, the median makes the paper that 
the media made in the previous one. We report that Galton's first attempts at con- 
structing his regressions were using the median which, although he found it more 
intuitive, presents algebraic difficulties that impede the associated calculation 
developments. 


In any case, this visualization corroborates the idea that was already forming. An 


interleaved line would pass through the origin of coordinates, it would have a pos- 
itive slope, but it would be less than one: 
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Diametro guisante hijo tipificado 





Diametro guisante hijo tipificado 
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Diametro quisante padre tipificado 


Figure 3: Box diagram of the Galton data typified. 


Using a scientific calculator, first, and an Excel spreadsheet, afterwards, the 
students estimate the "slope" that, in some way, will define the possible rela- 
tionship. The ease of the instruments allows to repeat different calculations 
and, even, to try different measurements, until finding the best approxima- 
tions. So, we apply trial and error methods. In the approximations made by 
the students on this occasion we find slopes between 0.30 and 0.50. To that 
slope Galton called it r (regression). 


It is the moment of formalization of the idea. Intuition makes us think that 
what is raised is an optimization problem: obtain the best line that represents 
the point cloud, that is, that minimizes the sum of the distances (squared) of 
the points of the scatter diagram to the line. The problem recalls one already 
solved when working with one-dimensional variables: minimization of the 
distances (squared) of the values of the distribution to a central value or, in 
other words, attempt to substitute the values of that distribution for a single 
value to represent them, a solution offered by the K6nig Theorem (already 
known by the students at this stage of the development of the discipline) and 
which leads directly to the arithmetic mean as the representative and opti- 
mum value in the sense of distances. It only remains to make the jump to the 
two-dimensional case. Instead of one variable, we have two. Generalizing, 
the optimal solution is again an arithmetic mean, being in this case that of the 
resulting variable when constructing the cross products of the original vari- 
ables. Therefore, the solution, the desired slope, is the average of the prod- 
ucts crossed between both variables or, what Pearson called, "moment- 
product". We can write, then r = YY, and the equation of the line that "ad- 
justs" the cloud of points as best as possible is given by y= rx, where y repre- 
sents the variable "diameter of typed children", that is, the variable effect, or 
variable to explain, or dependent, while x is that of the "diameters of the 
typed parents", variable cause, or explanatory, or independent. The line, as 
expected, passes through the origin of coordinates. For the standardized 
Galton data we obtain r = 0.346. Then, the adjusted line is represented 
between the point cloud: 


es a ee 


=1,00000 “0, 500000 0.00000 0.50000 1.00000 1, 50000 


Diametro guisante padre tipificado 


gure 4: Dispersion diagram of the Galton data typified together with 
the straight line using the momentum-product as a slope. 
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With several simple examples, with few data, we show the effect of the dif- 
ferent variability ofx and y on the slope of the slope of the line. The objective 
is, from the slope initially constructed for the two variables with equal dis- 
persion, to derive the resulting slopes in cases where the dispersion of x is 
greater than that of y, when both are equal (as in the case of previous data typi- 
fied) and when the variability of x is less than that of y. The student captures 
how these variabilities, measured through their typical deviations, are mak- 
ing the line tilt (when S,>S,) or rise (when S, < S,). Then, with various calcu- 
lation tests and relying on the previous optimization itself, we come to for- 
mulate the following equation y=r(S,/S,)x. 


From this point, the student begins to distinguish the two fundamental 
parameters of the regression analysis: r, which we call correlation coeffi- 
cient (degree of linear relationship between the two variables) and »- r2 ; 


the slope of the line, where 1n the quotient the typical deviations of the two 

related variables. The three previous examples have been prepared so that 

they have the same value ofr, so that the one that starts in this study captures 

that the change of slopes is due, in these cases, to the existing difference in 

the variability of both. We get the student to be aware of the difference 

between correlation and slope, and to accept the quotient °» asacorrector 
S 


or equalizer between both terms. By the way, the proportion is served: the 
slope is the standard deviation of y as the correlation coefficient is to that of 
x, or the ratio between slope and correlation coefficient is the same as 
between the standard deviation ofy andthatofx:) _ 5, 

r § 


The return to the original variables, undoing the typifications, brings out ina 
simple way the constant or independent term on the line. 


Through Excel we calculate adjusted values and errors or residuals (differ- 
ences between observed and adjusted). It is easy to check in the spreadsheet 
that the residuals add up to zero, that the compensatory effect between posi- 
tive and negative errors is total with the adjusted line. But the errors exist and 
can be higher or lower depending on the dispersion level of the point cloud. 
The need for a measure of goodness of fit arises. It must be an aggregate, a 
summary measure of the errors. Since errors can be positive and negative, a 
quadratic measure avoids the possible compensatory effect of the sign. The 
sum of squares of errors or residuals is then proposed: SCR, which will be 0 
when the adjustment is perfect and the greater the more dispersion there is in 
the diagram, the greater. One way to give security to work is to propose to the 
student the use of other values for the slope of the line, that is, other adjust- 
ments of the same, and the corresponding associated calculation of the SCR 
to show that, in any case, that aggregate of errors is always greater. The SCR 
is an absolute measure of the goodness of fit. The same, of course, depends 
on the scale used for the dependent variable. On the other hand, we can cal- 
culate the variability of said variable, through its variance, which we repre- 
sent by VT (total variance), also the variability of the adjusted or explained 
values, VE (explained variance) and, finally, the variability of the errors or 
residuals, which is not more than the SCR converted into the average when 
divided by the sample size and that we represent by VR (residual variance). 
The calculations lead the student to check the following intuitive relation- 
ship: the fundamental basis of the calculation of the regression. It leads us 
to propose a second measure of goodness of fit, in this case relative and, 
therefore, useful to compare with other adjustments: "Proportion of variance 
explained". We call it the Determination Coefficient and its representation 
and calculation are defined in the following equality p? = = 


The student's surprise is capitalized when he verifies that this coefficient 
coincides with the square of r which, in turn, justifies the symbolism used to 
represent it. 


The jump to multiple regression will be natural and, for the student, almost 
necessary when he is aware of the need to introduce more than one influenc- 
ing factor in the objective variable. We illustrate how Galton realized, 
shortly after having collected and analyzed his data on peas, that the previ- 
ous generation of immediate parents can also influence individual character- 
istics (Pearson, 1930). He points out that, even, certain characteristics skip 
one or more generations, occasionally; Aman may be more like his grandfa- 
ther than his father, in certain aspects. In an 1898 article in the journal Nature 
(cited in Pearson, 1930), Galton published an ingenious diagram that 
divided a square unit into successive smaller squares, each representing the 
diminishing influence of the previous generations of the ancestors. about the 
current individual. Galton came up with the germ of the idea of multiple 
regression. A characteristic or variable can be influenced not only by a single 
important cause, but by many causes of greater and lesser importance. Some 
of these causes may even overlap each other (that is, the explanatory vari- 
ables are correlated with each other). In later publications Galton listed some 
mathematical formulas that picked up this same basic idea, but he was never 
able to develop a complete mathematical treatment of the subject: 


"The somewhat complicated mathematics of multiple correlation, with its 
repeated appeals to the geometric notions of hyperspace, left him a closed 


room." (Pearson 1930, p.21) 
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However, Galton's conceptualization of the multiple influences of the ances- 
tors on the characteristics of the individual of the present was completely par- 
allel to the modern conception of multiple regression. As with the simple lin- 
ear regression and the correlation coefficient, Galton put the imaginative pre- 
liminary work that Pearson later develops with a rigorous mathematical 
treatment. Pearson's subsequent work included the further development of 
multiple regression as well as innovative progress in other statistics. Then, 
using the formulation (although updated) and Pearson's examples, we intro- 
duce the student to multiple regression. 


3. RESULTS AND DISCUSSIONS: 

Once the experience is over, the professors who intervene in the same have three 
instruments to evaluate their result: the work developed by the students during 
the classes, the survey of the students and the qualification of the exam on this 
subject. In Spain, the qualification of an exam has a rank between 0 (minimum 
qualification) and 10 (maximum qualification). An exam is considered to have 
passed when its grade 1s greater than or equal to 5. 


Other practical examples were proposed, all extracted from the history of this dis- 
cipline. Some developed in class, with assistance and supervision of teachers. 
Students were asked, in addition to the appropriate calculations, the preparation 
ofaconclusive report on what was extracted from these calculations. In this way, 
we take the opportunity to initiate them in the drafting of this type of report. The 
works, written in Word and sent by e-mail were rated ona scale of 0 to 10. The rat- 
ings of these works yielded the following results: 


¢ Theaverage score was 7.71 witha standard deviation of 2.03. 


¢ The 50th percentile, or median, was 8.50, while the 75th percentile takes the 
value of 9.1, which gives us an idea of the majority of grades between out- 
standing and outstanding. 


¢ Ahistogram of these ratings is shown in Figure 5. 


The student survey was developed at the end of the experience. They were pre- 
sented with a series of affirmations about which they manifested from their "total 
disagreement" to their "total agreement", on a Likert scale of five categories, 
which follow the indicated path. At the end of the survey, they were asked to 
make a global assessment of the experience through a rating, ona scale of 0 to 10. 





Calficacion de las practicas 


Figure 5: Histogram of the qualifications of the practices. Own 
elaboration based on student survey data 





The following tables show percentage results of some of the statements made in 
the survey. 


Affirmation: The historical context has helped me understand why the regres- 
sion. 


Table 2. Distribution of answers to question 1. Own elaboration based on 
student survey data. 











Possible answers Percentage 
Neither agree nor disagree Dass) 
In agreement Sal 
Totally agree | 40.0 














Affirmation: The experience developed has helped me to get loose in the han- 
dling of Excel. 
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Table 3. Distribution of answers to question 2. Own elaboration based on 
student survey data. 





























OSL) Ce Percentage 
In disagreement | 5a 
Neither agree nor disagree ei 
In agreement Die 
| Totally agree We 









































Affirmation: The learning of Statistics is more motivated using historical con- 
texts. 


Table 4. Distribution of answers to question 3. Own elaboration based on 
student survey data. 







































| | Possible answers Percentage 

Neither agree nor disagree 4A 
In agreement | 30.0 

| Totally agree 28.6 








Regarding the qualification that the students give to this experience, we summa- 
rize it in the statistics that appear in the following table (qualification with rank 
between 0 and 10). 


Table 5. Qualification of the experience by the students. Own elaboration 
based on student survey data. 
































Arithmetic average ; dol 
Median | 8.0 
Mode | 8.0 
Standard deviation | 0.7 
Quartiles 25 | 7.0 

50 8.0 

[oe 8.0 














Finally, we show what, perhaps, we are most interested in as teachers: knowing if 
the experience developed has positively contributed to the student's learning. 
The way in which we intend to evaluate is the written exam, similar in structure 
and contents to previous courses. The most important statistics related to the 
exam note corresponding to this subject are the following: 


Table 6. Exam grades. Own elaboration based on student survey data. 




















Arithmetic average | 6.04 
Median 6.10 
Mode 10.0 
Standard deviation im 23) a 
Quantiles 25 3.90 
33 5.00 
50 6.10 
61 7.00 
aS ae) 














Therefore, 67% of the students exceeded the subject, with almost 40% qualified 
with outstanding or outstanding. We have taken the grades corresponding to this 
same group, but from the previous year, and through at test for independent sam- 
ples, we compared the grades for those two consecutive years. We show results: 


Table 7. Comparison of results of two consecutive years. Own elaboration 
based on student survey data. 










Arithmetic average 





2017 3,93 | pe 
2018 | 6,04 | 2,39 











Table 8. Test t for equality of means. Own elaboration based on student 
survey data. 


Test of Levene 


for the equality Test T for equality of means 


of variances 








p-value Difference Standard error 


F p-value t gl (bilateral test) of means of the difference 























0.989 -4.131 84 0.000 
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Assuming equal variability in the grades of one and another course (which con- 
firms Levene's test), the difference between the average scores, of something 
more than 2 points, on a scale of 0 to 10, in favor of the grades of the last year 
(when the experience was carried out) is statistically significant (p = 0.000) 
according to this Student's t-test. 


We are not so daring as to think that the difference in the average marks of both 
courses is due exclusively to the development or not of the commented experi- 
ence. We are very aware, and our experience as teachers tells us so, that in each 
group and in each course many factors intervene, some known by the teacher and 
others not, that influence the exam grades. Therefore, it is difficult for us to 
assess, how the development of experience has weighed in that difference of 
notes. We think that the results presented in this section can give a vision, even if 
it is approximate, that could be valid to make a positive assessment of this pro- 
posal. 


4. CONCLUSION: 

The history of scientific progress is a good didactic argument. Confronting the 
student with the same problems with which the pioneering scientists laid the foun- 
dations for new theories is another way of motivating learning. If we combine a 
process of trial and error, intuitive, different approaches to the solution of the 
problem, given the possibilities that allows a spreadsheet, we can ensure the rein- 
forcement and understanding of the subject taught. The theoretical formalism a 
posteriori. In short: intuition and calculation, observation and reflection. Route 
ofa path that scientists of important stature followed at the time. 


REFERENCES: 
1. DUKE, J. D. (1978). Tables to Help Students Grasp Size Differences in Simple Corre- 
lations. Teaching of Psychology, 5, 219-221. 


2. FITZPATRICK, P. J. (1960). Leading British Statisticians of the Nineteenth Century. 

Journal of the American Statistical Association, 55, 38-70. 

GALTON, F. (1894). Natural Inheritance (5th ed.). New York, Macmillan and Com- 

pany. 

4. GOLDSTEIN, M. D., STRUBE, M. J. (1995). Understanding Correlations: Two Com- 
puter Exercises. Teaching of Psychology, 22, 205-206. 


ies) 





5. KARYLOWSKI, J. (1985). Regression Toward the Mean Effect: No Statistical Back- 
ground Required. Teaching of Psychology, 12, 229-230. 

6. PEARSON, E. S. (1938). Mathematical Statistics and Data Analysis (2nd ed.). 
Belmont, CA: Duxbury. 

7. PEARSON, K. (1896). Mathematical Contributions to the Theory of Evolution. III. 
Regression, Heredity and Panmixia. Philosophical Transactions of the Royal Society 
of London, 187, 253-318. 

8. PEARSON, K. (1922). Francis Galton: A Centenary Appreciation. Cambridge Univer- 
sity Press. 

9. PEARSON, K. (1930). The Life, Letters and Labors of Francis Galton. Cambridge Uni- 
versity Press. 

10. STANTON, J. M. (2001). Galton, Pearson, and the Peas: A Brief History of Linear 


Regression for Statistics Instructors. Journal of Statistics Education Vol 9, N.3 


International Education & Research Journal [IERJ] 


